norm regularization
Why neural networks find simple solutions: The many regularizers of geometric complexity
In many contexts, simpler models are preferable to more complex models and the control of this model complexity is the goal for many methods in machine learning such as regularization, hyperparameter tuning and architecture design. In deep learning, it has been difficult to understand the underlying mechanisms of complexity control, since many traditional measures are not naturally suitable for deep neural networks. Here we develop the notion of geometric complexity, which is a measure of the variability of the model function, computed using a discrete Dirichlet energy. Using a combination of theoretical arguments and empirical results, we show that many common training heuristics such as parameter norm regularization, spectral norm regularization, flatness regularization, implicit gradient regularization, noise regularization and the choice of parameter initialization all act to control geometric complexity, providing a unifying framework in which to characterize the behavior of deep learning models.
Unified View of Matrix Completion under General Structural Constraints
Suriya Gunasekar, Arindam Banerjee, Joydeep Ghosh
Matrix completion problems have been widely studied under special low dimensional structures such as low rank or structure induced by decomposable norms. In this paper, we present a unified analysis of matrix completion under general low-dimensional structural constraints induced by any norm regularization. We consider two estimators for the general problem of structured matrix completion, and provide unified upper bounds on the sample complexity and the estimation error. Our analysis relies on generic chaining, and we establish two intermediate results of independent interest: (a) in characterizing the size or complexity of low dimensional subsets in high dimensional ambient space, a certain partial complexity measure encountered in the analysis of matrix completion problems is characterized in terms of a well understood complexity measure of Gaussian widths, and (b) it is shown that a form of restricted strong convexity holds for matrix completion problems under general norm regularization. Further, we provide several non-trivial examples of structures included in our framework, notably including the recently proposed spectral k -support norm.
Estimation with Norm Regularization
Analysis of estimation error and associated structured statistical recovery based on norm regularized regression, e.g., Lasso, needs to consider four aspects: the norm, the loss function, the design matrix, and the noise vector. This paper presents generalizations of such estimation error analysis on all four aspects, compared to the existing literature. We characterize the restricted error set, establish relations between error sets for the constrained and regularized problems, and present an estimation error bound applicable to {\em any} norm. Precise characterizations of the bound are presented for a variety of noise vectors, design matrices, including sub-Gaussian, anisotropic, and dependent samples, and loss functions, including least squares and generalized linear models. Gaussian widths, as a measure of size of suitable sets, and associated tools play a key role in our generalized analysis.
7 Appendix
Additional experimental results are presented from Section 7.12 on. In Section 5.2 "Alignment of adversarial perturbations with singular vectors", we have seen that As we have seen in Section 4.2, it is the dominant singular vector corresponding to the largest singular value that determines the optimal adversarial perturbation to the Jacobian and hence the maximal amount of signal-gain that can be induced when propagating an The gradient of the loss w.r.t. the logits of the classifier takes The derivation goes as follows. The rest is clever notation. By Hölder's inequality, we have for non-zero z, v See comment after Equation 1.1 in [21]. " 1 (40) where we have used that ( p Moreover, if v is of the form v " sign p z qd| z | Thus, the numerator (and hence the direction) remains the same.
- Europe > Switzerland > Zürich > Zürich (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Data-Driven and Theory-Guided Pseudo-Spectral Seismic Imaging Using Deep Neural Network Architectures
Full Waveform Inversion (FWI) reconstructs high-resolution subsurface models via multi-variate optimization but faces challenges with solver selection and data availability. Deep Learning (DL) offers a promising alternative, bridging data-driven and physics-based methods. While FWI in DL has been explored in the time domain, the pseudo-spectral approach remains underutilized, despite its success in classical FWI. This thesis integrates pseudo-spectral FWI into DL, formulating both data-driven and theory-guided approaches using Deep Neural Networks (DNNs) and Recurrent Neural Networks (RNNs). These methods were theoretically derived, tested on synthetic and Marmousi datasets, and compared with deterministic and time-domain approaches. Results show that data-driven pseudo-spectral DNNs outperform classical FWI in deeper and over-thrust regions due to their global approximation capability. Theory-guided RNNs yield greater accuracy, with lower error and better fault identification. While DNNs excel in velocity contrast recovery, RNNs provide superior edge definition and stability in shallow and deep sections. Beyond enhancing FWI performance, this research identifies broader applications of DL-based inversion and outlines future directions for these frameworks.
- Europe > Germany (0.27)
- North America > United States > California (0.27)
- Atlantic Ocean (0.14)
- (3 more...)
- Energy > Oil & Gas > Upstream (1.00)
- Information Technology (0.67)
- Materials (0.67)
- Health & Medicine > Diagnostic Medicine > Imaging (0.67)
Why neural networks find simple solutions: The many regularizers of geometric complexity
In many contexts, simpler models are preferable to more complex models and the control of this model complexity is the goal for many methods in machine learning such as regularization, hyperparameter tuning and architecture design. In deep learning, it has been difficult to understand the underlying mechanisms of complexity control, since many traditional measures are not naturally suitable for deep neural networks. Here we develop the notion of geometric complexity, which is a measure of the variability of the model function, computed using a discrete Dirichlet energy. Using a combination of theoretical arguments and empirical results, we show that many common training heuristics such as parameter norm regularization, spectral norm regularization, flatness regularization, implicit gradient regularization, noise regularization and the choice of parameter initialization all act to control geometric complexity, providing a unifying framework in which to characterize the behavior of deep learning models.
Reviews: Learning Low-Dimensional Metrics
Summary of the paper: The paper considers the problem of learning a low-rank / sparse and low-rank Mahalanobis distance under relative constraints dist(x_i,x_j) dist(x_i,x_k) in the framework of regularized empirical risk minimization using trace norm / l_{1,2} norm regularization. The contributions are three theorems that provide 1) an upper bound on the estimation error of the empirical risk minimizer 2) a minimax lower bound on the error under the log loss function associated to the model, showing near-optimality of empirical risk minimization in this case 3) an upper bound on the deviation of the learned Mahalanobis distance from the true one (in terms of Frobenius norm) under the log loss function associated to the model. Quality: My main concern here is the close resemblance to Jain et al. (2016). Big parts of the present paper have a one-to-one correspondence to parts in that paper. Jain et al. study the problem of learning a Gram matrix G given relative constraints dist(x_i,x_j) dist(x_i,x_k), but without being given the coordinates of the points x_i (ordinal embedding problem).
Nuclear Norm Regularization for Deep Learning
Scarvelis, Christopher, Solomon, Justin
Penalizing the nuclear norm of a function's Jacobian encourages it to locally behave like a low-rank linear map. Such functions vary locally along only a handful of directions, making the Jacobian nuclear norm a natural regularizer for machine learning problems. However, this regularizer is intractable for high-dimensional problems, as it requires computing a large Jacobian matrix and taking its singular value decomposition. We show how to efficiently penalize the Jacobian nuclear norm using techniques tailor-made for deep learning. We prove that for functions parametrized as compositions $f = g \circ h$, one may equivalently penalize the average squared Frobenius norm of $Jg$ and $Jh$. We then propose a denoising-style approximation that avoids the Jacobian computations altogether. Our method is simple, efficient, and accurate, enabling Jacobian nuclear norm regularization to scale to high-dimensional deep learning problems. We complement our theory with an empirical study of our regularizer's performance and investigate applications to denoising and representation learning.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)